Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating a new variable out of *one standard deviation above* its mean

    Hi, I am trying to create a new variable out of one standard deviation above its mean. I used some codes, it did not help. Generating a new variable capturing for example one standard deviation above mean math grades? Thank you!

  • #2
    This may help:

    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . su mpg
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
             mpg |         74     21.2973    5.785503         12         41
    
    . gen himpg = mpg > (r(mean) + r(sd)) if mpg < .
    
    . tab himpg
    
          himpg |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |         63       85.14       85.14
              1 |         11       14.86      100.00
    ------------+-----------------------------------
          Total |         74      100.00

    Comment


    • #3
      Your question is not very clear. For example the sentence "I used some codes, it did not help." contains two really big problems: First, you did not tell what code you used. We cannot fix unknown code. Second, "it did not help" could mean many things: it gave an error message, it gave results but not the ones i want, it did nothing, etc. etc. So a good question will tell us:
      • what your data looks like by presenting an example (see help dataex),
      • what you want to achieve. In your case it is unclear what your new variable is supposed to look like. Is it the value when it exceeds the cutoff and missing otherwise, is it supposed to be an indicator/dummy variable, or something else
      • what you have tried. Give us the exact code
      • why the results are not what you want. Don't assume that is obvious, very often it is not.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        I have a variable like this.


        Variable | Obs Mean Std. Dev. Min Max
        -------------+---------------------------------------------------------
        var | 27,297 1.131764 .3169751 1 4

        I used above codes and found a result that does not give meaning

        gen hi_var = var > (r(mean) + r(sd)) if var< .


        . tab hi_var


        hi_var | Freq. Percent Cum.
        ------------+-----------------------------------
        0 | 27,297 100.00 100.00
        ------------+-----------------------------------
        Total | 27,297 100.00


        What can be the problem?
        Last edited by ishak celik; 23 Feb 2022, 08:14.

        Comment


        • #5
          Originally posted by Maarten Buis View Post
          Your question is not very clear. For example the sentence "I used some codes, it did not help." contains two really big problems: First, you did not tell what code you used. We cannot fix unknown code. Second, "it did not help" could mean many things: it gave an error message, it gave results but not the ones i want, it did nothing, etc. etc. So a good question will tell us:
          • what your data looks like by presenting an example (see help dataex),
          • what you want to achieve. In your case it is unclear what your new variable is supposed to look like. Is it the value when it exceeds the cutoff and missing otherwise, is it supposed to be an indicator/dummy variable, or something else
          • what you have tried. Give us the exact code
          • why the results are not what you want. Don't assume that is obvious, very often it is not.
          Hi, I would like to create a dummy variable "one standard deviation above the mean". I checked forum before and tried some codes that I saw. here is the variable for example:

          .summ var1

          Variable | Obs Mean Std. Dev. Min Max
          -------------+---------------------------------------------------------
          var1 | 26,004 1.21038 .4415799 1 5

          when I use this code : gen hi_var1 = var1 > (r(mean) + r(sd)) if var1 < .
          I have a result that looks like this:

          . tab hi_var1

          hi_var1 | Freq. Percent Cum.
          ------------+-----------------------------------
          0 | 26,004 100.00 100.00
          ------------+-----------------------------------
          Total | 26,004 100.00
          It seems that I could not have a dummy variable out of this code. I don't know why is that? can you guys suggest a solution for this?

          Comment


          • #6
            Is that a continuous variable or a ordered variable with values 1,2,3,4, or 5 (maybe a Likert scale)? If the latter is the case, then the whole mean + standard deviation as a cut-off makes little sense. Especially since then there is often a more substantively interesting cut-offs easily available.

            The code you used has to happen immediately after the sum command, as it is the sum command that temporarily creates the r(mean) and r(sd).
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Ok, 1. the variable is an averaged index of ordinal level variables. (5 questions measuring depressive symptoms with 5 points Likert like frequency points). 2. when I run it right afterthe sum command it gives a result. Thanks.

              Comment


              • #8
                Dear Nick Cox

                I have an similar issue like the post #1 https://www.statalist.org/forums/for...an#post1651574

                I want to generate a new variable that categorizes a numeric variable into
                “average” (mean)
                "high” (1.5 standard deviations above the mean)
                “low” (1.5 standard deviations below the mean)

                It would be great help, if you might provide the stata code for that using your example (below).

                Thank you very much.


                Originally posted by Nick Cox View Post
                This may help:

                Code:
                . sysuse auto, clear
                (1978 automobile data)
                
                . su mpg
                
                Variable | Obs Mean Std. dev. Min Max
                -------------+---------------------------------------------------------
                mpg | 74 21.2973 5.785503 12 41
                
                . gen himpg = mpg &gt; (r(mean) + r(sd)) if mpg &lt; .
                
                . tab himpg
                
                himpg | Freq. Percent Cum.
                ------------+-----------------------------------
                0 | 63 85.14 85.14
                1 | 11 14.86 100.00
                ------------+-----------------------------------
                Total | 74 100.00

                Comment


                • #9
                  Code:
                  . sysuse auto, clear
                  (1978 automobile data)
                  
                  . su mpg
                  
                      Variable |        Obs        Mean    Std. dev.       Min        Max
                  -------------+---------------------------------------------------------
                           mpg |         74     21.2973    5.785503         12         41
                  
                  . gen wanted = cond(mpg < r(mean) - 1.5 * r(sd), 1, cond(mpg < r(mean) + 1.5 * r(sd), 2, 3)) if mpg <   .
                  
                  . tab mpg wanted, missing
                  
                     Mileage |              wanted
                       (mpg) |         1          2          3 |     Total
                  -----------+---------------------------------+----------
                          12 |         2          0          0 |         2 
                          14 |         0          6          0 |         6 
                          15 |         0          2          0 |         2 
                          16 |         0          4          0 |         4 
                          17 |         0          4          0 |         4 
                          18 |         0          9          0 |         9 
                          19 |         0          8          0 |         8 
                          20 |         0          3          0 |         3 
                          21 |         0          5          0 |         5 
                          22 |         0          5          0 |         5 
                          23 |         0          3          0 |         3 
                          24 |         0          4          0 |         4 
                          25 |         0          5          0 |         5 
                          26 |         0          3          0 |         3 
                          28 |         0          3          0 |         3 
                          29 |         0          1          0 |         1 
                          30 |         0          0          2 |         2 
                          31 |         0          0          1 |         1 
                          34 |         0          0          1 |         1 
                          35 |         0          0          2 |         2 
                          41 |         0          0          1 |         1 
                  -----------+---------------------------------+----------
                       Total |         2         65          7 |        74 
                  
                  . 
                  
                  .  scatter wanted mpg, ms(none) mlab(wanted) mlabpos(0) yla( 1 2 3) 
                  .

                  Comment


                  • #10
                    Thank you very much!

                    Comment


                    • #11
                      However, nothing special ever happens 1.5 SD away from the mean so far as I know, so you're throwing away information....

                      Comment

                      Working...
                      X